NOTE: (9/30/2003) It's hard to believe, but I still get Procmail questions from readers of this tutorial, after six years! At this point, most of the questions I get are about how to filter spam with Procmail. I've included some links on this issue at the end of the tutorial.This file is currently maintained by Ian Soboroff. I can be reached at ian@umbc.edu. Please feel free to mail me concerning questions, additions, or corrections.
Since there seems to be a bit of confusion on this point, I didn't write Procmail. I wish I had, since it's a fine piece of software, but I only wrote this tutorial. I am not a source for Procmail software distributions, or for help on compiling or installing it. You can get the software via FTP, and you should find manual pages and such with the distribution there. Once you've got Procmail set up (or confirmed that it is already set up at your site!), come on back and follow this tutorial.
You control Procmail yourself, through a file that you put in your home directory. This Web page will guide you through the complexities of writing this file.
This web page is meant to cover the basics. First, I'll walk through a sample filter setup. After that, I'll build a set of filters from scratch as a tutorial. This should be sufficient to get you up and running, using most of Procmail's normal features.
Procmail has several manual pages (online help); their titles and how to read them is discussed at the end. I've also included links to a couple spam-filtering utilities; for more, you might try Googling for "spam filter procmail"
Currently, these UMBC systems are already running Procmail. All you need to do is compose a special file, called .procmailrc (don't forget that leading dot!), which describes the sorting criteria. Once you have this file in your $MAIL directory, Procmail will automatically be run on any incoming mail you receive.
Side Note -- a bit of Unix trickeryThe whole trick to Procmail is writing the .procmailrc file. However, to the beginner, the format may look like some magical incantation, so I'll start with a small example (actually, an excerpt from my personal .procmailrc!) and walk through it. This is going to entail discussion of a lot of particulars and details, but don't worry; if things seem to digress or just plain stop making sense, odds are they'll be explained more fully later. After that, I'll construct a new .procmailrc, tutorial-style.Files in Unix that begin with a dot '.' are hidden files. So, when you use the
ls
command to view the files in your home directory, you may not see the .procmailrc file, or any other so-called "dot-files", right away. To see hidden files in your directory, use the '-a' option, as inls -a
. The '-a' stands for "all files," and will show you both hidden and visible files in one listing.
Now, don't panic, it's not as bad as it looks. A .procmailrc has two parts, assignments and recipes. The assignments set up variables so that Procmail knows where programs and mailboxes are; that's the top part. The recipe is the incantation at the bottom. Anything preceeded by a hash mark (#) is a comment, and is ignored.# .procmailrc # routes incoming mail to appropriate mailboxes PATH=/usr/bin:/usr/local/bin MAILDIR=$HOME/.mailspool # all mailboxes are in .mailspool/ DEFAULT=$HOME/.mailspool/ian LOGFILE=/dev/null SHELL=/bin/sh # Put mail from DC-Linux mailing list into mailbox dclinux :0: * ^(From|Cc|To).*dc-linux dclinux
Assignments
The assignments section tells Procmail where to find things, such as
your mailboxes, or programs that it might need to run. The set of
assignments above pretty much cover what most users should need; the
full set is discussed in the procmailrc man page.
Here are descriptions of the assignments in the excerpt above. They take the format variable-name = value.
PATH=/bin:/usr/bin:/usr/local/bin
MAILDIR=$HOME/.mailspool # all mailboxes are in .mailspool/
DEFAULT=$HOME/.mailspool/ian
LOGFILE=/dev/null
LOGFILE=$MAILDIR/log.`date +%y-%m-%d`
SHELL=/bin/sh
Recipes have the following format:
The flags and lock-file business I'll cover later. The idea is that if the conditions are met, the action is performed. Now, let's look again at the simple recipe from above, which filters my mail from a DC-area Linux users group into it's own mailbox::0 [flags] [: [lock-file] ] zero or more conditions one action line
The action line in this case is simple: it's just 'dclinux,' the name of the folder to put the mail into. The action could also be an address to forward the mail to, or a program to start, or even a block of commands. We'll see more complex examples later.# Put mail from DC-Linux mailing list into mailbox dclinux :0: * ^(From|Cc|To).*dc-linux dclinux
The condition tells Procmail what to look for in a mail message. They begin with a '*', and the rest is a pattern to look for. If part of the message matches this pattern, then Procmail will apply the action. The pattern is called a regular expression, and takes some explaining. To briefly translate before I dive in, this pattern translates to:
at the beginning of a line, 'From' or 'Cc' or 'To', followed by some number of characters, followed by 'dc-linux'.Thus, this pattern would match messages with 'dc-linux' in the From, Cc, or To lines of the header. Neat, huh?
Most .procmailrc files have more than one recipe. The rule is, unless
you tell it otherwise, Procmail will stop at the first recipe that
matches the message. I'll show how to get around this in the
tutorial.
Regular Expressions
Regular expressions are actually reasonably simple, once you get the
hang of them.
First and foremost, any character that isn't a special character mentioned below matches itself. This includes all letters and numbers, and some punctuation. That is to say, the regular expression
Bobmatches the string "Bob". In Procmail, regular expressions are case insensitive, so this will also match "bob", or "bOb", or "BOB", for that matter.
A dot '.' matches any character except a newline. So, the expression
.ob Joneswill match the string "Bob Jones", but also "Rob Jones" and "Qob Jones", too.
Any character followed by a star '*' matches that character repeated 0 or more times. Thus,
Bob* Jonesmatches "Bo Jones", "Bob Jones", or "Bobbbbbbbbbb Jones". The expression ".*" matches any number of unspecified characters.
Related are the '+' and '?' modifiers. The expression "a+"
matches one or more a's. The expression "a?"
matches zero or one a.
You can use parentheses to group an expression for use with a modifier. So, the expression
B(ob)+matches "Bob", and also "Bobobobobobob".
If one character in a pattern could be one of several, you can use a character class. For example:
Part [abcd]matches "Part a", "Part b", "Part c", and "Part d". If the first character of a class is '^', the class matches anything _not_ in the class. For example:
[^aeiou]+matches any series of one or more non-vowel characters.
One more operator is the '|' (vertical-bar) character. It is used to match either of two expressions. For example:
Bob|Joewill match "Bob" or "Joe".
The last two special characters I want to mention are '^' and '$'. Incidentially, here I'm referring to a '^' that isn't inside a character class. '^' means the beginning of a line, and '$' means the end of one. So,
^To:would match the letters 'To:' at the beginning of a line. If that looks suspiciously like part of a mail header, consider it a preview. ;-)
This comprises most of the special characters that Procmail uses in regular expressions. There are a few others, but the manual pages for egrep and procmailrc explains them as well, and if I'm not careful this will turn into a help sheet on regular expressions!
Now, what is all this about matching, anyway? Well, now you should be able to see that your regular expression recipe represents a pattern in a mail message. We will use regualr expressions to tell Procmail what patterns to look for. Next, I'll walk through the construction of several recipes, and you'll see how it's done.
We're going to use the same assignments section as described above. Unless you have your mailbox in an odd place, or want to use logs, you'll probably find what I've included to be just fine.
Let's say we belong (as I do) to the mailing list Israeline, which sends out daily news clipping collections from Israeli news sources. It might be nice to automatically have these digests automatically placed in a special mail folder, which we'll call 'israel'.
Mail from this list comes addressed like so:
To: Multiple recipients of list <israeline@nysernet.org>This has changed in the past, but it always has that address in it, so we'll use part of that as our pattern. Our pattern will be to match "a line starting with 'To:' and containing 'israeline'", or
^To:.*israeline
. The recipe will look like this:
:0: # the last colon means use a lockfile * ^To:.*israeline israel # put these messages in the 'israel' folderOne thing to remember, by the way, is don't put any comments on a condition line. If you do, Procmail will think the comment is part of the pattern!
Ok, now what's all this about a 'lockfile'? Well, suppose two israeline messages came in at about the same time. It's very possible that the mail system would fire up two copies of Procmail, and each would try to write its message to your 'israel' folder! By using a lockfile, the first Procmail that gets run will 'lock' the folder so only it can write to it; any other Procmail trying to write to that folder will have to wait until the first is finished. Using lockfiles may slow down your mail delivery ever so slightly, but it's better than mangled mail.
Now, suppose your colleague Bob likes to send you lists of jokes that he finds around the Net every so often, usually with "joke" or "funny" in the subject line. We don't want this frivolity cluttering our otherwise clean, businesslike work mailbox, so we'll forward it to our account at the university. The tricky part is we want to make sure we don't forward Bob's vital business memos too. We'll use two conditions in the recipe; one to match mail from Bob, and one to match the subject. Here's how the recipe looks:
:0 # forward jokes to my wossamatta u. account * ^From.*bob * ^Subject:.*(joke|funny) ! rocky@wossamatta.eduThree things to note here. First, forwarding mail is done with the '!' at the beginning of the action line, followed by the address. Second, notice that I don't have a colon after 'From' in that condition. This is a quirk of mail headers; there are header From lines with and without colons, so leaving it off is the safest bet. Third, since we're just forwarding the mail and not writing to a file, we don't need a lockfile.
Of course, even though I'm sending the joke mail off somewhere else, I'd still like to read the jokes, even if they're not in my mailbox! We could print out those messages, as well as forwarding them; that way we could read them and no one would know...
The new thing here, besides having an action run a program, is that we're going to modify the above recipe so we have two actions. We'll do this with a technique called nesting. Here's the modified recipe:
:0: # forward jokes to my wossamatta u. account * ^From.*bob * ^Subject:.*(joke|funny) { :0 c ! rocky@wossamatta.edu :0 | lpr -Pacsps }Instead of an action line, we're using a nested block, which is enclosed in braces. This block is like a secondary .procmailrc file; in it, we can put any number of recipes, which will only be used if the 'parent' recipe applies.
The first recipe in the block is to send off the mail. It uses a flag in its first line, a 'c'. The 'c' flag means to copy the mail, so that the next recipe also gets a copy of the mail, since ordinarily, mail only goes to the first recipe that fits it. The 'c' flag allows us to apply two recipes to a single message.
We send a message to a program using the vertical bar '|' symbol to start off the action line. This means "send the message as input to the following program." In Unix this is called a "pipe". So, here we're piping the mail message to the program "lpr", which will print the message on the printer "acsps".
In a similar way, let's archive the messages we get from another mailing list, called (let's say) "junk". So, while we deliver the messages to our mailbox, we'll keep the body of the messages in a compressed file, which we could unpack later.
:0 bc: # archive things sent to junk mailing list * ^To:.*junk | gzip >> junk-archive.gzHere we're using two flags. The 'b' flag means that the action line will just take the body of the message, and not the header. The 'c' line, again, means to just take a copy of the message for this recipe, and pass it along to the recipes after. We're using that because we want to archive the message, but we'd also like it to be filed in our mail inbox as usual.
The pipe is another Unixism, telling Procmail to send the message to the compression program "gzip", which will squash the text and put it at the end of the file "junk-archive.gz". This file can be uncompressed for later reading with the "gunzip" command, like so:
gunzip junk-archive.gzThis covers most of the basic recipes that one might create. The limit from here is only your own needs. The manual pages (described below) will be your best course now. The page called procmailrc describes all the flags you can use, and the page called procmailex contains more examples.
As a sort of quiz, look at the following recipe of mine and try to figure out what it does. I used it as the first recipe in my .procmailrc when I went traveling recently:
:0 Wc: vacation.lock |/usr/sbin/vacation ian(hint: look at the manual page for the program 'vacation', and also look at the example in 'procmailex' about sending automatic replies)
man topicwhere topic is usually a command name. Procmail has several man pages which explain aspects of the program:
The regular expressions used by Procmail are the same as those used by the Unix program egrep; these in turn are an extension of the set used by ed, a time-worn editor program. ed's man page is the online bible for regular expressions. egrep's man page discusses the extensions. The procmailrc man page gives a summary.umbc9[1]% man procmailex
Procmail is written by Stephen R. van den Berg, at RWTH-Aachen, Germany. The latest version can be found at ftp.informatik.rwth-aachen.de
Also, the comp.mail.misc newsgroup occasionally has traffic on Procmail and mail filtering in general.
(added 9/2003) I get a lot of questions on spam filtering with Procmail. I don't recommend trying to write individual scripts by hand... the spammers are too good, and you'll spend all the time you save writing new Procmail scripts. Instead, you should consider using an external filter, whose output you can process with Procmail. Here are a couple spam-filtering (and generic filtering) packages you can easily use along with Procmail, and which will do a much better job than hand-tuned filtering scripts.
Good luck!